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Issue 





• Climate model and various environmental monitoring 
and protection applications have begun to increasingly 
rely on satellite measurements. 

• Research application users seek good quality satellite 
data, with uncertainties and biases provided for each 
data point 


• Remote-sensing quality issues are addressed rather 
inconsistently and differently by different communities. 
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Where are we in respect to this data 

challenge? 


“The user cannot find the data; 

If he can find it, cannot access it; 

If he can access it, ; 

he doesn't know how good they_ are± 

if he finds them good, he can not merge 
them with other data” 


The Users View of IT, NAS 1 989 
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Challenges in dealing with Data Quality 


• Q: Why now? What has changed? 

• A: With the recent revolutionary progress in data systems, 
dealing with data from many different sensors finally has 
become a reality. 

Only now, a systematic approach to remote sensing 

quality is on the table. 

• NASA is beefing up efforts on data quality. 

• ESA is seriously addressing these issues. 

• QA4EO: an international effort to bring communities 
together on data quality. 

• G f§» ue 
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Intercomparison of data from multiple sensors 


Data from multiple sources to be used together: 

• Current sensors/missions: MODIS, MISR, GOES, OMI. 

• Future missions: ACE, NPP, JPSS, Geo-CAPE 

• European and other countries’ satellites 

• Models 

Harmonization needs: 

• It is not sufficient just to have the data from different sensors and their 
provenances in one place 

• Before comparing and fusing data, things need to be harmonized: 

• Metadata: terminology, standard fields, units, scale 

• Data: format, grid, spatial and temporal resolution, wavelength, etc. 

• Provenance: source, assumptions, algorithm, processing steps 

• Quality: bias, uncertainty, fitness-for-purpose, validation 

Dangers of easy data access without proper assessment of the joint 
data usage - It is easy to use data incorrectly 
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Three projects with data 

quajity flavor 


• We have three projects where different 
aspects of data quality are addressed. 

• We mostly deal with aerosol data 

• I’ll briefly describe them and then show 
why they are related 


6/30/2011 


EGU2011 


5 


Data Quality Screening Service 
for Remote Sensing Data 

The DQSS filters out bad pixels for the user 

• Default user scenario 

- Search for data 

- Select science team recommendation for quality 
screening (filtering) 

- Download screened data 

• More advanced scenario 

- Search for data 

- Select custom quality screening parameters 

- Download screened data 
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DQSS Ontology 
(The Whole Enchilada) 


^ QualityCriteria J 
are 

J 

MinimumQualityLevel 


hasQualityCriteria 


^ QualityVIew j 

hasScreeningAssertion 
at least 
One 

I 

ScreeningAssertiorTj hasSurfaceType SurfaceType j 


I QV_TPW_NIR I 

hasScreeningAssertion ' hasScreeningAssertion 

- 


_SA_TPW_NIR_llsefulness 


_SA_TPW_NIR_Confidence 


hasDataField hasScreeningField hasMinimumQualityLevel hasDataField hasScreeningField hasMinimumQualityLevel 

t_ i * I * 


TPW_NIR 


TPW_NIR_Usefulness 


1 

(xsd: integer) 


TPW_NIR TPW_NIR_Confidence 


(xsd integer) 


Cw 


hasDataField hasScreeningField 

at least at least 

One One 

j 

[ DataField)* 


( SurfaceTypeField j ( ReferenceConstraintField ] hasQualityLevel ►f Quality Level 1 f Screening Field j 

I - — k - — . n J arp — " ^ 


isRepresentedlnVariable 

^ describesDataSlice 



hasReferenceField 


PBest | I PGood | hasConstraintRelationship 




ReferenceField is a — I PSurfStd] 


I 


f DataVariable j 

^ f DataSlice 1 

is a ^ -J 

i 

| TAIrStd | 


|Qual_CO|- 


is a — ►[ Quality LevelField j [ CoordinateConstraintField j . 




ConstraintRelationship 


- describesQualityFor 


hasDimensionLimits 

i 

[ DimensionLimits ] 


hasBitfieldRange 


hasQualityLevelDirection 

I 

C 


hasIndexOffset _ 


Quality Level Direction 


j 



Dimensionldentifier^ 


>- 


a — equals 


nGoodStd nBestStd 


hasValidQualityLevel _ 


xsd : nonNegativelnteger 

hasQualityLevel 

hasConstraintRelationship 

— \ 


BitfieldRange 


] 


^ QualityLevel j 


hasMinimumlndex 


hasEndianness 

hasDimensionldentifier 


hasQualitylnterpretation 

hasIntegerQualityLevel is a 


^ ConstraintRelationship j 

~r 


xsd : nonNegativelnteger 


i/ hasMaximumlndex ( endianness ) 


hasStartBitfield hasStopBitfield 


/ c 


Dimensionldentifier 


) 


, , i 

d: nonNegativelnteger xsd: nonNegativelnteger f Qualitylnterpretation J xsd integer 


| lessThan | greaterThanOrEqualTo 


xsd : nonNegativelnteger 


[ DimensionName ) ( Dimensionlndex j 


| Marginal | — ' is a 


This Concept Map was created with ^ 

IHMC CmapTooIsC- 
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DQSS Ontology (Zoom) 


DataQu al ity Vi e w s_v 2 . 4 , 1 _ZGGm 

File Edit Format Collaborate Tools Window Help 
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View as Web Page 


http:/ /cmapspublicB jhmc.us:&0/servlet/SBReadResourceServlet?rid=1295Qfi5Q97032_1845473340_lf!271&partName=htmltext 














AeroStat: Online Platform for the 
atistical Intercomparison of Aerosols 


Explore & Visualize Level 3 



Compare Level 3 


Level 3 are too 
aggregated 


■■ 


Explore & Visualize Level 2 




Correct Level 2 


Switch to 


high-res 

Compare Level 2 

Level 2 

Before and After 


Merge Level 2 
to new Level 3 
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IQ Qurator Model 


Qurator info model used our assessment of existing quality models. 

• Describes a model whereby: 

- data annotated with quality annotation metadata 

• QA metadata can be associated with data of varying degrees of granularity 

- ex: products, collections, arrays, specific values, etc. 

- this supports our interest in associated data with a product 

• Quality evidence, a measurable quantity, provides a 'clue' into the quality 

- ex: hit-ratio, standard deviation, etc. 

- common examples associated with statistical analysis 

- often computed in QC 

- would global coverage, scatter plots, etc. fit? 

• Quality assertions are domain-specific functions based on quality evidence 

- good, bad, ugly 

- No confidence, marginal, good, best 

• Quality property (aka quality dimensions) 


- accuracy, completeness, currency 

- many dimensions of quality to consider, each with different evidence 
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f Quality Evidence 

t 

are 

f HitRatio 'j 

4 

is a 

I HitRat.ol has Value ► (Afloat) 

This i5 an instance of QualityEvidence 
as shown in the IQ Qurator paper (page 4} 


IQ Curator Model 
Application to our Project 


^ QualityEvidence "jj 

4 

are 

^ UncertaintyErrorEstimate^ 


DataProduct 


t 


# M ODIS_Ae ronet_Su m me r_U ncerta iin ty ErrorEsti mate 


DataEntity 


t 

are 

I 


ScatterPlot 


gen erated By P nocess 

S 


t 


#UEE_ Process 


hasMeasure - 


# M ODIS_Ae rone t_S u m me r_U E E 


- hasProducf - 


# M ODIS_Aeronet_Su m mer_U E E_Plot 


I 


1 

^ Process J 


Created a separation between data and a plot or visualization 
of that data (which is a subclassQf DataProduct) 


hasURL 

4 

http://foo.png 
(xsd : anyURI) 


Process is genera I, could be 
used to describe report activities, 
QA or bias studies, research,, etc. 


To support complex QualityEvidence quantifications 
hasValue is now an obj- property to a DataEntity. This 
is in contrast to how it is modeled in the IQ Qurator 
1 . paper (see HitRatio example) 

DataEntity 


hasMeasure 


^ Process ^ (je n erated ByP rocess — ^ Qua lity E v id ence^ 


GiovanniService 


assertionBasedOnEvidence 


described B y Qua Sit y Evid ence 


evidenceDescribes 
Qua lity Property 


^ Qua lity Assertion^ -4 hasQu a lity Assertion ^ DataEntity J 


hasProduct 


ofData Entity 


a sse rti onOfQ ua I i ty Pro party 

1 

^ ^ Qua lity Property j 


DataProduct 


^ DataPlot^ 

t 

are 

f MapPlot j 


)— 1 f 


xsd : anyURI 


are 

i 


Lat Lon Plot 
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This Concept Map was created with t 

INMC CmapTool-s C- 



Application to Focus Area 


Da rk La n dCond itio n 


r 


Da rk La n d Al go ri th m 


) 


AerosolCondition 


hasBiasCondition 


^ Land^j 



[ AerosolCond itio n "J DependsOn Aerosol Loading — p- j " Aerosol Load i ng"" j — are ! 


^ KegionalBias J 


^ LightAerosol Loading j 
^ Heavy Aerosol Loading j 


Seasonal Bias 


J 


^ Terra Aero net j 


^ IndianQcean J 


Calibration Bias 


J 


hasM eas u red Reg i o n 


[aqd] " 


^ BrightLand j 


hasBiasCondition 


^ DeepBlueBias^ 
^ RegionalBias J 


^ RegionalBias J — AppliesToRegion — > 


Ocean 


hasBuasCc n 



^ Seasonal Bias j 


AerosolCondition 



j 


^ RegionalBias J 


SeasonalBias 


J 


Calibration Bias 


6/30/2011 


SeasonalBias j — A P P 1 ' esT 0 Sea so n — ^ Season J -4 a re 


f Calibration Bias/ ]- — AppliesTo Platform — > ^ Platform ^ 
'2rQ-1u1n e 


^ DecemberJanuaryFebruary 
^ MarchAprilMay ^ 

^ JuneJulyAugust j 



Septe m be rOcto berN o vem ber 


] 


nent 


^ Instrument^ 


Multi-Sensor Data Synergy 
Advisor (MDSA) 

• Goal : Provide science users with clear, cogent 
information on salient differences between 
data candidates for fusion, merging and 
intercomparison 

-Enable scientifically and statistically valid 
conclusions 

•Develop MDSA on current missions: 

-Terra, Aqua, (maybe Aura) 

• Define implications for future missions 
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Title: MODIS Terra C5 AOD vs. Aeronet during Aug-Oct Biomass burning-in Central Brazil, 


(General) Statement: Collection 5 MODIS AOD at 550 nm during Aug- 
Oct over Central South America highly over-estimates for large AOD 
and in non-burning season underestimates for small AOD, as 
compared to Aeronet; good comparisons are found at moderate AOD. 
Region & season characteristics. Central region of Brazil is mix of forest, 
cerrado, and pasture and known to have low AOD most of the year except 
durinq biomass burninq season 

(Dominating factors leading to Aerosol Estimate bias): 

1 . Large positive bias in AOD estimate during biomass burning season may 
be due to wrong assignment of Aerosol absorbing characteristics. 

(Specific explanation) a constant Single Scattering Albedo - 0.91 is 
assigned for all seasons, while the true value is closer to -0.92-0. 93. 

[ Notes or exceptions : Biomass burning regions in Southern Africa do not show as large 
positive bias as in this case, it may be due to different optical characteristics or single 
scattering albedo of smoke particles, Aeronet observations of SSA confirm this] Q 

2. Low AOD is common in non burning season. In Low AOD cases, biases o 

are highly dependent on lower boundary conditions. In general a negative ^ 
bias is found due to uncertainty in Surface Reflectance Characterization ^ 
which dominates if signal from atmospheric aerosol is low. O 

(Example) : Scatter plot of MODIS AOD and AOD at 550 nm vs. Aeronet from 
ref. (Hyer et al, 2011 ) (Description Caption) shows severe over-estimation of 
MODIS Col 5 AOD (dark target algorithm) at large AOD at 550 nm during Aug- 
Oct 2005-2008 over Brazil. (Constraints) Only best quality of MODIS data 
( Quality =3 ) used. Data with scattering angle >170 deg excluded. (Symbols) 
Red Lines define regions of Expected Error (EE), Green is the fitted slope 
Results: Tolerance= 62% within EE; RMSE=0.212; r2=0.81; Slope=1.00 
For Low AOD (0.2) Slope=0.3. For high AOD (> 1 .4) Slope=1 .54 




0 1 2 
Aeronet AOD 


Reference: Hver, E. J.. Reid, J. S., and Zhang, J., 201 1 : An over-land aerosol optical depth data set for data assimilation by filtering, correction, 
and aggregation of MODIS Collection 5 optical depth retrievals, Atmos. Meas. Tech., 4, 379-408, doi:10.5194/amt-4-379-201 1 









FACETS OF DATA QUALITY 
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Quality Control vs. Quality Assessment 


• Quality Control (QC) flags in the data (assigned by 
the algorithm) reflect “happiness” of the retrieval 
algorithm, e.g., all the necessary channels indeed 
had data, not too many clouds, the algorithm has 
converged to a solution, etc. 

• Quality assessment is done by analyzing the data 
“after the fact” through validation, intercomparison 
with other measurements, self-consistency, etc. It is 
presented as bias and uncertainty. It is rather 
inconsistent and can be found in papers, validation 
reports all over the place. 
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Different kinds of reported and 
perceived data quality 


• Pixel-level Quality (reported): algorithmic guess at usability 
of data point (some say it reflects the algorithm “happiness”) 

- Granule-level Quality: statistical roll-up of Pixel-level Quality 

• Product-level Quality (wanted/perceived): how closely the 
data represent the actual geophysical state 

• Record-level Quality: how consistent and reliable the data 
record is across generations of measurements 


Different quality types are often erroneously assumed having the 

same meaning 

Different focus and action at these different levels to ensure Data 

Quality 
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Percent of Biased Data in MODIS Aerosols Over 
Land Increase as Confidence Flag Decreases 


Very Good 


Good 


Marginal 


Bad 



■ Compliant* 
Biased Low 

■ Biased High 


0% 20% 40% 60% 80% 100% 

*Compliant data are within + 0.05 + 0.2T Aeronet 


Statistics from Hyer, E. J., Reid, J. S., and Zhang, J., 2011 : An over-land aerosol optical depth 
data set for data assimilation by filtering, correction, and aggregation of MODIS Collection 5 
optic&13(96}ftti1 retrievals, Atmos. Meas. Tecltpt! 370-408, doi:10.5194/amt-4-379-2011 18 




General Level 2 Pixel-Level Issues 




• How to extrapolate validation knowledge about selected Level 2 pixels to 
the Level 2 (swath) product? 

• How to harmonize terms and methods for pixel-level quality? 


AIRS 

Quality Indicators 


0 Best Data Assimilation 

1 Good Climatic Studies 

2 Do Not Use 


Match up the Purpose 
recommendations? 



MODIS Aerosols Confidence 

Flags 


Ocean Land 


3 Very Good 

3 Very Good 

2 Good 

2 Good 

1 Marginal 

1 Marginal 

0 Bad 

0 Bad 

Use these flags in order to stay 

within expected error bounds 

Ocean 

Land 

±0.03 ± 0.10 t 

±0.05 ± 0.15 t 
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Spatial and temporal sampling - how to quantify to make 

it useful for modelers? 


MODIS Aqua AOD July 2009 


MISR Terra AOD July 2009 



• Spatial sampling patterns are different for MODIS Aqua and MISR Terra: 
CpHlsafcfr^j” areas over ocean are oriented differently due to different direction 

• a§prfeitinil^dterring ; day-time measurement -> Cognitive bias 
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Addressing Level 3 data “quality” 





• Terminology: Quality, Uncertainty, Bias, Error budget, etc. 

• Quality aspects (examples): 

-Completeness: 

• Spatial (MODIS covers more than MISR) 

• Temporal (Terra mission has been longer in space than Aqua) 

• Observing Condition (MODIS cannot measure over sun glint while MISR can) 

-Consistency: 

• Spatial (e.g., not changing over sea-land boundary) 

• Temporal (e.g., trends, discontinuities and anomalies) 

• Observing Condition (e.g., exhibit variations in retrieved measurements due to the viewing 
conditions, such as viewing geometry or cloud fraction) 

- Representativeness: 

• Neither pixel count nor standard deviation fully express representativeness of the grid cell 
value 
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Some differences in L3 are due to difference 

processing 


• Spatial and temporal binning (L2->L3 daily) leads to 
Aggregation bias : 

- Measurements (L2 pixels) from one or more orbits can go into a 
single grid cell -> different within-grid variability 

- Different weighting: pixel counts, quality 

- Thresholds used, i.e., > 5 pixels 

• Data aggregation (L3D -> L3monthly -> regional -> 
global): 

- Weighting by pixel counts or quality 

- Thresholds used, i.e., > 2 days 

While these algorithms have been documented in ATBD, reports and 
papers, the typical data user is not immediately aware of how a given 

portion of the data has been processed, and what is the resulting impact 
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Case 1: MODIS vs. MERIS 



Same parameter 



UYD03_UJjOCS AjstosJ Opt'd 




Different results - why? 


Same space & time 



A threshold used in MERIS processing effectively excludes high aerosol 

values. Note: MERIS was designed primarily as an ocean-color instrument, so aerosols are 
“obstacles” not signal. 



Ojpli-L^dii Ihiduicfw 


a 1 - 

15 \ ||| 





Case 2: Aggregation 


AOD difference between sensors 


MODIS Terra only AOD: difference 
between diff. aggregations 

Globally Averaged: AOD aver ocean: Terra 



Mishchenko et al., 2007 


Levy, Leptoukh, et al., 2009 


The AOD difference can be up to 40% due to differences in 

aggregation 



ase 3: DataDay definition 



MODIS-Terra vs. MODIS-Aqua: Map of AOD temporal correlation, 2008 

Conrel ation(A£B) (0lJan200S - 31Dec2008) 

A: MOD08_D3.005 Aercaol Optical Depth ct 550 nm (unitleea) 

B: MYD08 D3.051 Aerosol Optical D 


.n.r.n 


BOH 


3GN 


30S 


60S 


90Sf 

0 


-1 


I I I I ' 

-0,S -0.6 -0.4 —0,2 


0.2 


0,4 


0,6 


O.S 


1 


MODIS Level 3 dataday definition leads to artifact in correlation 
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Conclusion 





• Quality is very hard to characterize, different groups will 
focus on different and inconsistent measures of quality. 

• Products with known Quality (whether good or bad 
quality) are more valuable than products with unknown 
Quality. 

- Known quality helps you correctly assess fitness-for-use 

• Harmonization of data quality is even more difficult that 
characterizing quality of a single data product 
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